[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager by ricky-chaoju · Pull Request #31643 · vllm-project/vllm

ricky-chaoju · 2026-01-03T08:54:45Z

Summary

Fix gibberish output on CPU backend when --enforce-eager is enabled.

When running vLLM on the CPU backend with --enforce-eager, models may produce incoherent or repetitive outputs.
This happens because enforce_eager=True sets custom_ops="all", enabling CustomOp dispatch on CPU. In this mode, CustomOp.dispatch_forward() selects forward_cpu() implementations when available.

Several CustomOp subclasses did not define forward_cpu(), causing them to fall back to the base class behavior, which delegates to forward_cuda(). On CPU, this path invokes the C++ CPU kernels, whose behavior diverges from the PyTorch native implementations in certain cases, leading to incorrect computations and degraded output quality.

Root cause

Missing forward_cpu() implementations in:

RotaryEmbedding
RMSNorm
GemmaRMSNorm
RMSNormGated

Fix

Add explicit forward_cpu() methods that delegate to forward_native().
This ensures that, when running on CPU with custom ops enabled, these layers consistently use the PyTorch native implementations, restoring correct behavior while keeping the existing execution model unchanged.

Test Plan

Tested with Qwen3-0.6B on CPU with --enforce-eager:

vllm serve "Qwen/Qwen3-0.6B" --enforce-eager --max-model-len 4096

Before / After (CPU, --enforce-eager)

Before:
"you so you so so so you so so so so"
"ellaneousellaneousellaneous"

After:
"27 member states, and each member state has a population of..."
"I'm here to help. How can I assist you?"

gemini-code-assist

Code Review

This pull request addresses a critical bug that causes models to produce incoherent output when running on the CPU backend with --enforce-eager. The root cause was correctly identified: several CustomOp subclasses lacked a forward_cpu implementation, causing a fallback to C++ kernels that have behavioral inconsistencies with their PyTorch native counterparts. The fix, which involves adding explicit forward_cpu methods to RotaryEmbedding, RMSNorm, GemmaRMSNorm, and RMSNormGated that delegate to forward_native, is sound and directly resolves the issue. The changes are consistent, well-targeted, and ensure that the correct, native PyTorch implementations are used on the CPU, restoring model correctness. The implementation is clean and follows existing patterns in the codebase.

Add explicit forward_cpu methods to CustomOp subclasses that delegate to forward_native, ensuring CPU backend uses PyTorch native implementation instead of buggy CPU C++ kernels when custom_ops='all'. Classes fixed: - RotaryEmbedding (rotary_embedding/base.py) - RMSNorm (layernorm.py) - GemmaRMSNorm (layernorm.py) - RMSNormGated (layernorm.py) Fixes vllm-project#31626 Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

chaunceyjiang · 2026-01-04T02:49:49Z

/cc @ProExpertProg PTAL.

bigPYJ1151 · 2026-01-05T08:45:28Z

Hi @rickychen-infinirc Thanks for the catch. The root cause is RMSNorm accepts non-contiguous inputs after #28103 but we didn't added it to the CPU kernels.

It's okay to fallback most of custom ops to torch native implementations because they are not performance-critical on CPU and can be compiled by torch compile in most cases. We can dispatch them to forward_native at:

vllm/vllm/model_executor/custom_op.py

Lines 69 to 71 in d5503ca

    
           def forward_cpu(self, *args, **kwargs): 
        
               # By default, we assume that CPU ops are compatible with CUDA ops. 
        
               return self.forward_cuda(*args, **kwargs)

One special case is RoPE, I prefer to use the CPU custom kernel in eager mode.

- Change CustomOp.forward_cpu() default to forward_native instead of forward_cuda, as most CPU custom kernels are not performance-critical and can have compatibility issues - Remove redundant forward_cpu() from RMSNorm, GemmaRMSNorm, RMSNormGated since they now inherit the base class behavior Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

ricky-chaoju · 2026-01-05T09:13:45Z

@bigPYJ1151 Thanks for the review and the clarification on the root cause!

I've updated the PR based on your suggestion:

Changed CustomOp.forward_cpu() default to dispatch to forward_native() instead of forward_cuda()
Removed the redundant forward_cpu() methods from RMSNorm, GemmaRMSNorm, and RMSNormGated - they now inherit the base class behavior
Kept RotaryEmbedding.forward_cpu() using the CPU custom kernel (ops.rotary_embedding) as you mentioned it's more performance-sensitive in eager mode

Tested with Qwen3-0.6B on CPU with --enforce-eager and the output is now correct.

mergify · 2026-01-05T12:36:34Z

Hi @rickychen-infinirc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-01-05T15:07:40Z

Hi @rickychen-infinirc, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

vllm/model_executor/custom_op.py

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

ricky-chaoju force-pushed the fix/cpu-enforce-eager-gibberish-output branch from 7f04646 to b714a1c Compare January 3, 2026 08:58

ricky-chaoju mentioned this pull request Jan 3, 2026

[Bug][CPU Backend]: Gibberish output on CPU backend when --enforce-eager is enabled (Qwen3-0.6B) #31626

Closed

1 task

ricky-chaoju force-pushed the fix/cpu-enforce-eager-gibberish-output branch from bf11a8e to 6f24b52 Compare January 3, 2026 10:58

ricky-chaoju force-pushed the fix/cpu-enforce-eager-gibberish-output branch from 6f24b52 to b75b49a Compare January 3, 2026 11:00

chaunceyjiang assigned ProExpertProg Jan 4, 2026

ricky-chaoju force-pushed the fix/cpu-enforce-eager-gibberish-output branch from 7afc967 to 0bd69ef Compare January 5, 2026 09:07

bigPYJ1151 approved these changes Jan 5, 2026

View reviewed changes

bigPYJ1151 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 5, 2026

ricky-chaoju force-pushed the fix/cpu-enforce-eager-gibberish-output branch from 86ce853 to 0bd69ef Compare January 5, 2026 12:43

style: add blank lines for PEP8 compliance

86b557a

Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

bigPYJ1151 merged commit c455b77 into vllm-project:main Jan 5, 2026
47 checks passed

ProExpertProg reviewed Jan 5, 2026

View reviewed changes

vllm/model_executor/custom_op.py Show resolved Hide resolved

ricky-chaoju deleted the fix/cpu-enforce-eager-gibberish-output branch January 6, 2026 11:01

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --e…

75a3919

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --e…

3967598

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --e…

ad187eb

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --e…

cf6f533

…nforce-eager (vllm-project#31643) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager#31643

[Bugfix][CPU] Fix RotaryEmbedding fallback causing gibberish with --enforce-eager#31643
bigPYJ1151 merged 4 commits intovllm-project:mainfrom
ricky-chaoju:fix/cpu-enforce-eager-gibberish-output

ricky-chaoju commented Jan 3, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chaunceyjiang commented Jan 4, 2026

Uh oh!

bigPYJ1151 commented Jan 5, 2026

Uh oh!

ricky-chaoju commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ricky-chaoju commented Jan 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chaunceyjiang commented Jan 4, 2026

Uh oh!

bigPYJ1151 commented Jan 5, 2026

Uh oh!

ricky-chaoju commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 5, 2026

Uh oh!

mergify bot commented Jan 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ricky-chaoju commented Jan 3, 2026 •

edited by github-actions bot

Loading